Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1022420190110040089
Phonetics and Speech Sciences
2019 Volume.11 No. 4 p.89 ~ p.96
Speech detection from broadcast contents using multi-scale time-dilated convolutional neural networks
Jang Byeong-Yong

Kwon Oh-Wook
Abstract
In this paper, we propose a deep learning architecture that can effectively detect speech segmentation in broadcast contents. We also propose a multi-scale time-dilated layer for learning the temporal changes of feature vectors. We implement several comparison models to verify the performance of proposed model and calculated the frame-by-frame F-score, precision, and recall. Both the proposed model and the comparison model are trained with the same training data, and we train the model using 32 hours of Korean broadcast data which is composed of various genres (drama, news, documentary, and so on). Our proposed model shows the best performance with F-score 91.7% in Korean broadcast data. The British and Spanish broadcast data also show the highest performance with F-score 87.9% and 92.6%. As a result, our proposed model can contribute to the improvement of performance of speech detection by learning the temporal changes of the feature vectors.
KEYWORD
speech detection, multi-scale time-dilated convolution, deep learning, broadcast data
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)